Associations between low Apgar scores and mortality by race in the United States: A cohort study of 6,809,653 infants

Background Apgar scores measure newborn health and are strongly associated with infant outcomes, but their performance has largely been determined in primarily white populations. Given the majority of the global population is not white, we aim to assess whether the association between low Apgar score and mortality in infants varies across racial groups. Methods and findings Population-based cohort study using 2016 to 2017 United States National Vital Statistics System data. The study included singleton infants born between 37+0 and 44+6 weeks to mothers over 15 years, without congenital abnormalities. We looked at 3 different mortality outcomes: (1) early neonatal mortality; (2) overall neonatal mortality; and (3) infant mortality. We used logistic regression to assess the association between Apgar score (categorized as low, intermediate, and normal) and each mortality outcome, and adjusted for gestational age, sex, maternal BMI, education, age, previous number of live births, and smoking status, and stratified these models by maternal race group (as self-reported on birth certificates). The cohort consisted of 6,809,653 infants (52.8% non-Hispanic white, 23.7% Hispanic, 13.8% non-Hispanic black, 6.6% non-Hispanic Asian, and 3.1% non-Hispanic other). A total of 6,728,829 (98.8%) infants had normal scores, 63,467 (0.9%) had intermediate scores, and 17,357 (0.3%) had low Apgar scores. Compared to infants with normal scores, low-scoring infants had increased odds of infant mortality. There was strong evidence that this association varied by race (p < 0.001) with adjusted odds ratios (AORs) of 54.4 (95% confidence interval [CI] 49.9 to 59.4) in non-Hispanic white, 70.02 (95% CI 60.8 to 80.7) in Hispanic, 23.3 (95% CI 20.3 to 26.8) in non-Hispanic black, 100.4 (95% CI 74.5 to 135.4) in non-Hispanic Asian, and 26.8 (95% CI 19.8 to 36.3) in non-Hispanic other infants. The main limitation was missing data for some variables, due to using routinely collected data. Conclusions The association between Apgar scores and mortality varies across racial groups. Low Apgar scores are associated with mortality across racial groups captured by United States (US) records, but are worse at discriminating infants at risk of mortality for black and non-Hispanic non-Asian infants than for white infants. Apgar scores are useful clinical indicators and epidemiological tools; caution is required regarding racial differences in their applicability.


Conclusions
The association between Apgar scores and mortality varies across racial groups. Low Apgar scores are associated with mortality across racial groups captured by United States (US) records, but are worse at discriminating infants at risk of mortality for black and non-Hispanic non-Asian infants than for white infants. Apgar scores are useful clinical indicators and epidemiological tools; caution is required regarding racial differences in their applicability.

Author summary
Why was this study done?
• Apgar scores are commonly used indicators of infants' well-being at birth and as predictors of mortality and long-term disability.
• Apgar scores have been validated in predominantly white populations.
• The impact of race on the relationship between Apgar score and early neonatal (death within 0 to 6 days of birth), overall neonatal (death within 0 to 27 days), and infant mortality (death within 1 year) is unknown.

What did the researchers do and find?
• We conducted a cohort study of all singleton, term-birth infants born to mothers over 15 years old, without congenital abnormalities, in the United States (US) in 2017 and 2018.
• Our analyses illustrated that race was associated with the assignment of the Apgar score category and with all categories (early neonatal, overall neonatal, and infant mortality) of mortality.
• There was a stronger association between low Apgar score and all categories of mortality among non-Hispanic white, non-Hispanic Asian and Hispanic infants than in non-Hispanic black and non-Hispanic other infants.
What do these findings mean?
• These findings suggest that Apgar scores are less useful for estimating the odds of mortality for non-Hispanic black and non-Hispanic non-Asian infants than for non-Hispanic white, non-Hispanic other infants.
• Apgar scores are useful predictors of morbidity and mortality; however, their association with mortality is influenced by infant race. Further work to understand which components of the score explain differential associations is needed for developing a scoring system that performs equally well across racial groups.

Introduction
The Apgar score has been used for nearly 70 years to measure infant health and physical wellbeing immediately after birth and as a predictor of mortality and indicator of an infant's response to resuscitative efforts [1,2]. Apgar scores are widely used in epidemiological studies for providing population-level information about infants' status at birth, predicting neurodevelopmental outcomes and infant mortality and as surrogate markers of morbidity [3][4][5]. In these contexts, Apgar scores are applied across populations, but the tool was developed and validated in predominantly white populations. The score is composed of 5 variables, each with a value of 0, 1, or 2 [6]. The variables are: heart rate, respiratory effort, muscle tone, reflex response, and skin coloring, each assessed at 1, 5 and 10 minutes after birth [6]. Of these, the 5-minute score is regarded as the best predictor of infant mortality and is commonly used in epidemiological studies and trials [6,7]. The overall Apgar score ranges from 0 to 10 and is frequently categorized as low (0 to 3), intermediate (4 to 6), or normal (7 to 10) [6]. These categorizations of the score were originally suggested by Dr. Virginia Apgar and allow comparison between similar studies [4,7,8].
Few studies validating the use of Apgar scores to assess infants' status at birth or to look at the association between Apgar scores and adverse infant outcomes have considered race and ethnicity and none have specifically considered white, black, Asian, Hispanic, and other racial groups individually. Two studies have reported that black infants were assigned lower Apgar scores than white infants [9,10]. Some studies have speculated that this may be due to differential interpretation of the skin color variable in infants that are not of white ethnicity [8][9][10]. Additionally, a study reported that 1-minute Apgar scores were the strongest predictors of infant mortality for Mexican American infants, the worst for black infants, with an intermediate ability for white infants [8].
The lack of understanding around the application of Apgar scores across different race groups is surprising given the wide racial disparities that exist in birth outcomes across many settings including the United States (US) [8,[11][12][13][14][15]. Black infants have an infant mortality rate of more than twice that of white infants in the US and have consistently been found to have higher incidences of adverse birth outcomes such as preterm birth, low birth weight, and being small for gestational age [11,16]. Drivers of poor pregnancy outcomes in some race groups are complex and are likely to reflect the interplay of multiple impacts of structural racism including socioeconomic inequalities, access to quality healthcare, and discrimination [17][18][19].
In light of the widespread use of Apgar scores in clinical and epidemiological settings, the lack of research on racial differences in the applicability of the scores and the prevalence of racial disparities in birth outcomes in the US and elsewhere, this study aims to report on the associations between maternal race, 5-minute Apgar scores, and infant mortality. The objectives of this study are to evaluate the association between maternal race and 5-minute Apgar score, the association between maternal race and mortality, and whether there is a differential association between 5-minute Apgar scores and mortality by race. We hypothesize that the associations between 5-minute Apgar scores and early neonatal, overall neonatal, and infant mortality differ by race.

Study design and setting
This population cohort study evaluated all infants born between January 1, 2016 and December 31, 2017 in the US (n = 7,820,866). This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (S1 Table).

Study participants
Inclusion criteria were single births of infants between 37 +0 and 44 +6 weeks to mothers older than 15 years who were residents of the US Births were excluded if they had no recorded gestational age, 5-minute Apgar score, or maternal race. Missing values for maternal age were imputed by National Center for Health Statistics (NCHS) as the age of the mother from the previous birth record of the same race and birth order in 0.01% of births, and missing values for plurality were imputed by NCHS as singleton births in 0.004% of births. As congenital abnormalities can affect birth outcomes, infants were excluded if they were born with any major congenital abnormality, which were identified by NCHS to include: anencephaly, meningomyelocele/spina bifida, cyanotic congenital heart disease, congenital diaphragmatic hernia, omphalocele, gastroschisis, limb reduction defect, cleft palate, Down syndrome, suspected chromosomal disorder, and hypospadias. This analysis was restricted to term births because elements of the Apgar score, including tone, color, and reflex irritability are dependent on physiological maturity, and recommendations on its use in preterm populations vary [20].

Data sources
All data were nonidentifiable and publicly accessible through the NCHS Division of Vital Statistics cohort-linked birth and death database, and complied with the NCHS, Centers for Disease Control and Prevention (CDC) Data User Agreement Terms and Conditions [21][22][23]. The database is composed of data collected directly from US Standard Birth and Death Certificates, including demographic information, and is commonly used in CDC Reports and national studies on infant mortality and neonatal outcomes [24,25]. NCHS linked birth certificate and death certificate data using linking identification numbers, resulting in 22,197 (99.6%) linked death records.

Variables
The outcome of interest was mortality across the first year of life, subdivided into early neonatal mortality (death within 0 to 6 days of birth), overall neonatal mortality (death within 0 to 27 days of birth), and infant mortality (death within 1 year of birth).
The explanatory variables were 5-minute Apgar score and maternal race/ethnicity, as a surrogate for infant race due to a high frequency of missing data for paternal race. Apgar score was measured by a birth attendant 5 minutes after birth, recorded on the infant's medical chart, and transcribed to the birth certificate by hospital staff using an NCHS facility worksheet [23]. The same worksheet was completed for births outside the hospital [23]. Scores were recorded as whole numbers in the birth certificates and were categorized for this analysis as low (0 to 3), intermediate (4 to 6) and normal (7 to 10). Maternal race was self-reported by the mother by choosing 1 or more of 15 race categories and 5 Hispanic origin categories from the NCHS facility worksheet [23]. The combined race/ethnicity variable used in this analysis was categorized based on the NCHS designations in order to analyze racial groups that can be compared across studies (S2 Table). This analysis considered Hispanic as a separate race/ethnicity category in accordance with NCHS guidelines for this dataset, which utilize single-race categorizations.
All multivariable regression models adjusted for the confounding effects of the following covariates: gestational age (continuous scale of whole numbered weeks), fetal sex (male or female), maternal educational attainment (�eighth grade, ninth to 12th grade without diploma, High School/ General Educational Development (GED), Associate's degree, Bachelor's degree, Master's degree, Doctorate or Professional degree, unknown), maternal body mass index (

Statistical methods
The data were analyzed using R version 4.0.2. No overall prospective analysis plan was used. Demographic characteristics were derived for the cohort and each Apgar score group (low, intermediate, normal), with frequencies and percentages reported for categorical variables and means and standard deviations reported for continuous variables.
Univariate logistic regression models were used to quantify the association between each racial group and odds of being assigned a low Apgar score (low versus not low), being assigned an intermediate Apgar score (intermediate versus not intermediate) and to quantify the association between race group and each mortality outcome (early neonatal mortality, neonatal mortality, and infant mortality). Univariate logistic regression models were also used to assess the crude association between each covariate and the mortality outcomes, stratified by race group.
Multivariable logistic regression models were conducted to determine the association between Apgar score and each mortality outcome in the total population and stratified by race group. We formally assessed whether there was evidence that the association between Apgar score and mortality varied by race group by including an interaction term in the adjusted model in the total population. Additionally, we conducted a chi-squared test to determine whether there were trends between Apgar score category and early neonatal, overall neonatal, and infant mortality rates among each race group.

Ethics committee approval
The study was sponsored by the University of Edinburgh (reference AC20095). Prior to commencement, the research was subject to the Usher Institute (University of Edinburgh) ethics and data protection oversight process. The ethics and data protection triage and overview selfaudit of ethics and data protection issues (completed by EG) confirmed that the proposed research, being secondary analysis of a fully anonymized publicly accessible dataset, posed no foreseeable ethics or data protection risks. This indicated there was no requirement for proceeding to a full formal ethics and data protection review by the Usher Research Ethics Group.

Descriptive cohort characteristics
The NCHS cohort-linked database recorded 7,820,866 live births between January 1, 2016 and December 31, 2017. As shown in Fig 1, 6,809,653 births were eligible for inclusion in this study. Data were missing for Apgar score (0.4%), maternal race (0.9%), and gestational age (0.8%), and these cases were excluded from analysis.

Association between maternal race and 5-minute Apgar score
Race group was associated with the assignment of Apgar score category (p < 0.001). Compared with non-Hispanic white infants, non-Hispanic black infants had 1.7 times the odds of being assigned a low Apgar score (95% confidence interval [CI] 1.6 to 1.8) and non-Hispanic other infants had 1.3 times the odds (95% CI 1.2 to 1.4) ( Table 2). Non-Hispanic Asian and Hispanic infants had 23% (95% CI 0.7 to 0.8) and 30% (95% CI 0.6 to 0.8) lower odds of being assigned low scores than non-Hispanic white infants, respectively.

Association between maternal race and mortality
There was strong evidence of an association between maternal race and mortality across the first year of life. Compared with non-Hispanic white infants, non-Hispanic black infants had higher odds of all categories of mortality (early neonatal mortality: odds ratio [ Table). Conversely, non-Hispanic Asian and Hispanic infants had lower odds for all mortality outcomes when compared with white infants. There was no  Year of birth 2016 evidence of a difference in early neonatal mortality between non-Hispanic other and white infants, but non-Hispanic other infants had higher odds of neonatal and infant mortality.

Impact of race on the relationship between 5-minute Apgar score and mortality
The early neonatal, overall neonatal, and infant mortality rates decreased with increasing Apgar scores in all races (S3 Table). Across all races, low Apgar score was a strong risk factor for mortality across the first year of life. Low Apgar score was a stronger risk factor than an intermediate score, and there was a strong association between score category and all categories of mortality (p < 0.001; S3 Table).
There was strong evidence that the adjusted association between low Apgar score and mortality varied by race group (p < 0.001) for all mortality outcomes. The adjusted odds ratio (AOR), comparing the odds of infant mortality among infants with low Apgar scores to those with normal Apgar scores, were higher in non-Hispanic Asian (AOR 100. 4 (Table 3). Similar associations were present in the early neonatal and overall neonatal mortality categories, with non-Hispanic black and non-Hispanic other groups consistently having the lowest odds ratio for the association between low Apgar score and mortality (Fig 2).
Similar trends of lesser magnitudes persisted in the associations between intermediate scores and all categories of mortality ( Table 3). The AORs for the associations between intermediate Apgar score and neonatal mortality varied across race groups (p < 0.001) with AORs of 22.0 (95% CI 19.2 to 25.3) for non-Hispanic white infants, 20.2 (95% CI 16.4 to 24.9) for non-Hispanic black infants, 32.5 (95% CI 20.2 to 52.5) for non-Hispanic Asian infants, 14.0 (95% CI 8.1 to 24.1) for non-Hispanic other infants, and 33.2 (95% CI 26.7 to 41.4) for Hispanic infants (p-value < 0.001). The differences between the race groups in these associations increased across the first year of life ( Table 3). The final adjusted models, stratified by race group, are presented in S5-S9 Tables.

Discussion
Overall, we find that low and intermediate Apgar scores are strongly associated with mortality across the first year of life in the US. These findings align with Dr. Apgar's original use of the score to predict neonatal mortality and support the use of the 5-minute Apgar score in research [4,7,26]. These data also illustrate for the first time, to our knowledge, that these strong associations persist across racial groups captured by US birth records. We do, however, also find evidence that there is variation in the relationship between Apgar score and mortality across different racial groups. Non-Hispanic black and non-Hispanic other infants have higher odds of mortality across the first year of life, and are more likely to be assigned a low Apgar score, when compared with white infants; yet, multivariable regression models revealed that the association between low Apgar score and mortality is weakest in non-Hispanic black and non-Hispanic other groups. These findings indicate that low Apgar scores are differentially associated with mortality across race groups and, more specifically, suggest that low Apgar scores are less good at discriminating the risk of mortality in non-Hispanic black and non-Hispanic other infants compared to other race groups.
The findings are consistent with literature suggesting that Apgar scores are strongly associated with mortality across the first year of life [6,13,27,28]. The results also align with research demonstrating that Apgar scores were more predictive of mortality in white and Mexican-American infants than in black infants [8]. This study expanded on the findings of previous research by evaluating more racial groups in a larger and more representative study population. These results add to a body of literature that suggests that the performance of mortality prediction tools in neonatal groups can be influenced by race, as demonstrated by the inclusion of race in the use of an estimator tool for bronchopulmonary dysplasia in preterm infants [29]. The mortality rates in our study population were lower than nationally reported estimates for 2018 of 3.75 neonatal deaths per 1,000 births and a rate of 5.64 infant deaths per 1,000 births [24,30]. The difference in rates is likely due to restricting the study population to term infants born without congenital malformations. We report that the 5-minute Apgar score has a strong association with infant mortality in a large multiracial population in a real-world setting. Therefore, the score can be used for informing prognosis and as a valid metric in epidemiological studies. However, there are differences in the strength of association between Apgar score and mortality between racial groups, which should be taken in to account in clinical practice and research studies.
Reduced strength of association between the 5-minute Apgar score and mortality in black and non-Hispanic non-Asian groups might be explained in part by systematic differences in assignment of score at birth. The potential differential assignment of the score by race therefore requires further investigation; we suggest that particular consideration be given to individual components of the score, particularly the "skin color" component. The "skin color" scheme relies on classifying infants as blue, pale, or pink and is unlikely to be equally AORs and 95% CIs for early neonatal, overall neonatal, and infant mortality in relation to Apgar score, stratified by maternal race group (n = 6,809,653). AORs of (1) early neonatal mortality; (2) overall neonatal mortality; and (3) infant mortality for infants with low (0 to 3) and intermediate (4 to 6) 5-minute Apgar scores referent to infants with normal (7 to 10) 5-minute Apgar scores. ORs were adjusted for infant sex, maternal age, maternal smoking status, infant birth weight, maternal education, maternal BMI, previous number of live births and gestational age. AOR, adjusted odds ratio; CI, confidence interval; Hisp, Hispanic; NHA, non-Hispanic Asian; NHB, non-Hispanic black, NHO, non-Hispanic other; NHW, non-Hispanic white.
https://doi.org/10.1371/journal.pmed.1004040.g002 efficacious across a range of skin tones, as demonstrated by a recent study stating that a majority of physicians do not agree that "pink all over" is an accurate description of vigorous African-American infants [8,31]. It is possible that refinement of the scoring system to capture circulatory status more reliably could improve its performance in identifying infants at high risk of mortality. However, the reasons for the attenuated association between Apgar score and mortality in black and Hispanic infants are certainly more complex than just differences in how the score is assigned at birth and are likely to be driven by the social drivers behind poor outcomes in certain race groups, which are not necessarily captured by clinical scores such as the Apgar. Further research is needed to understand the complex social and structural pathways that explain the differential associations between Apgar scores and mortality across racial groups in order to inform future use of the score.
This study has a number of strengths. The large sample size derived from a population of all births in the US from 2016 to 2017 allowed for a study cohort that was representative of the US population and minimized selection bias. All of the data were derived from routinely collected NCHS data, minimizing recall and social desirability biases. The present study included 5 maternal racial groups in analysis, allowing for a more thorough analysis of racial differences than previous studies.
The study also has limitations, largely due to the nature of routinely collected data. There were missing values for some covariates, and due to the low frequency of missing values and lack of significant associations between them, these missing values were included as "unknown" categories. The analysis considered maternal race as the infant's race due to a high proportion of missing data for paternal race, which will have results in some misclassification for infants where paternal race was different than maternal race. This analysis excluded preterm births due to concerns about the reliability of the Apgar score in this population; however, a recent study has demonstrated a strong association between low Apgar score and neonatal mortality in preterm infants [3,20]. Further analysis should be conducted to assess how the association between Apgar score and mortality varies by race group among preterm infants.
A further limitation may have been that models were not adjusted for all maternal comorbidities or medication use, because the data to do so reliably were not available.

Conclusions
Overall, low 5-minute Apgar scores are strongly associated with early neonatal, overall neonatal, and infant mortality outcomes in a large multiracial population. There are strong associations between race and the 5-minute Apgar score, with black infants having higher odds of being assigned a low score than their white counterparts. Strong associations also exist between race and mortality within the first year of life, with black and non-Hispanic other infants having the highest odds of neonatal and infant mortality. Importantly, there are racial differences in the strength of the association between Apgar score and mortality. This suggests that while Apgar scores should continue to be used in clinical and research settings, practitioners and researchers should be aware that both the assignment and predictive ability of the Apgar score varies across racial groups.
Supporting information S1